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Many complex systems may be described not by one, but by a number of complex networks 
mapped one on the other in a multilayer structure [J. The interactions and dependencies between 
these layers cause that what is true for a distinct single layer does not necessarily reflect well the 
state of the entire system. In this paper we study the robustness of three real-life examples of two- 
layer complex systems that come from the fields of communication (the Internet), transportation 
(the European railway system) and biology (the human brain) . In order to cover the whole range of 
features specific to these systems, we focus on two extreme policies of system's response to failures, no 
rerouting and full rerouting. Our main finding is that multilayer systems are much more vulnerable 
to errors and intentional attacks than they seem to be from a single layer perspective. 

PACS numbers: 89.75.Hc, 89.75.Fb, 89.40.Bb, 89.20.Hh 



The robustness of a complex system can be defined 
by how it behaves under stress. There are two general 
categories of such stress: errors - failures of random com- 
ponents, and attacks - failures of components that play a 
vital role in the system. Recently, many complex systems 
have been successfully described in terms of complex net- 
works 0- These graphs may greatly differ in their re- 
sponse to failures. For instance, the 'scale-free' networks 
(i.e., networks whose node degree distribution is heavy- 
tailed 0) such as World Wide Web, Internet, protein net- 
works, ecological networks or cellular networks, exhibit 
remarkable robustness to errors , but at the same time, 
they are very vulnerable to attacks such as the removal 
of the most highly connected nodes 4j 5] Subse- 
quent studies of other attack strategies 8| 9|, cascading 
failures _10];ilJ, defensive strategies 10| 12] 13] 14] 15], 
and vulnerability of weighted networks jl^ gave us valu- 
able insights into the robustness of complex networks 
treated as distinct objects. Many of such networks, how- 
ever, are only a part of larger systems, where a num- 
ber of coexisting topologies interact and depend on each 
other 0. For instance, in the Internet, a graph formed 
by an application (such as WWW or Peer-To-Peer) is 
mapped onto the IP network that, in turn, is mapped on 
a physical mesh of cables and optical fibers. The topol- 
ogy at every layer is different. Similarly, it is convenient 
to view a transportation network as a two-layer system, 
with a network of traffic demands mapped onto the phys- 
ical infrastructure. This layered view sheds a new light 
on the issue of the error and attack tolerance of many 
complex systems. We show in this paper that what is 
observed at a single layer does not necessarily reflect well 
the state of the entire system. On the contrary - a tiny, 
seemingly unharmful (from one-layer perspective) disrup- 
tion of the lower layer graph may destroy a substantial 
part of the upper layer graph rendering the whole system 
useless in practice. 
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TABLE I: Two-layer systems analyzed in this paper: 'Rail- 
way' - train traffic flows on top of the railway network of cen- 
tral Europe; 'Gnutella' - Gnutella P2P network on top of the 
AS level Internet; 'Brain' - long distance cortex-to-cortex ax- 
onal connections in the human brain on top of the 3D lattice 
in the white matter. (/) is the average shortest path length; 
(m) is the average mapping length. 



A framework for an analysis of layered complex net- 
works was recently introduced in 1]. In a two-layer 
case, the system consists of a weighted logical graph 
— {V'^^E^) and the underlying physical graph G"^ = 
{V't'.E'i'). The logical nodes are a subset of physical 
nodes, C . Every logical edge = {u^,v^) is 
mapped on the physical graph as a physical path M{e^) 
connecting the nodes u'^ and v"^, corresponding to and 
v\ 

This layered framework allows us to study the robust- 
ness of the entire system. As the mapping of logical edges 
is usually longer than one hop, many physical links serve 
more than one logical edge (see Fig. ^| . A failure of such 
a physical link affects all logical edges that are mapped 
on it. In other words, failures at the physical layer prop- 
agate to the logical layer, and at the same time they 
multiply. Moreover, the resulting failures at the logical 
layer are strongly correlated in time and space. These 
three phenomena make the response of a layered system 
to failures much more complex than what is observed at 
a single layer. 

In our study we use three large examples of layered sys- 
tems that come from fields as different as transportation, 
communication and biology. We present an overview of 
these data sets in Table and describe each of them 
below. 

Our first data set, called 'Railway', is the European 
railway system. It is extracted from timetables of 60'775 
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FIG. 1: Illustration of failure propagation, multiplication, and 
correlation in a two-layer system. A single failure in the phys- 
ical graph results in three correlated failures in the logical 
graph. 



trains in central Europe with the algorithm described 
in 0. The resulting physical graph reflects the real- 
life infrastructure that consists of 4'853 nodes (stations) 
and 5'765 edges (rail tracks). The logical graph contains 
7'038 edges, each connecting the first and the last station 
of a train. The logical edge weight is the number of trains 
following the same route. The route itself is the mapping 
of this edge on the physical graph. 

The second data set, called 'Gnutella', is an example 
of a large Peer-To-Peer (P2P) application in the Inter- 
net. In a P2P system the links between users are virtual 
and therefore they are usually created independently of 
the underlying Internet structure, forming a very differ- 
ent topology. Due to its immense size and dynamics, the 
exact map of the Internet at the IP level (i.e., where the 
nodes and IP routers and hosts) is still beyond our reach. 
Therefore we focus on its aggregated version, where each 
node is an Autonomous System AS (usually an Internet 
Service Provider), and where edges reflect the connec- 
tions between the ASes. The topology of such AS-level 
Internet is well known thanks to numerous Internet map- 
ping projects such as DIMES or CAIDA For our 
physical graph we take the 09/2004 topology provided by 
CAIDA, which consists of 16'911 nodes and 37'849 edges. 
For the logical graph we take a snapshot of the Gnutella 
P2P network collected in September 2004 by the crawler 
developed in |23|. It consists of around 1 million users, 
connected by several million links. In order to obtain the 
AS-level version of this network, we translated the IP ad- 
dresses of the users into the corresponding AS numbers. 
All users with the same AS number become one node in 
the logical graph, and all links connecting the same pair 
of ASes become one edge of weight equal to the number 
of contributing links. As a result we obtain an AS-level 
logical graph of Gnutella with 1'214 nodes and 31'193 
edges. The mapping of each logical edge is obtained by 
the shortest path in the physical graph connecting its 



end- nodes. 

Our third data set, called 'Brain', captures the large 
scale connectivity of the human brain. It was inferred 
from MRI scans with the approach described in . In 
particular, the brain cortex and the brain white matter 
are partitioned into a set of compact regions of compa- 
rable size. There are 1'013 regions in the cortex and 
3'432 regions in the white matter. Every region becomes 
a node in the physical graph. The logical edges in this 
data set are the long distance axonal connections between 
the 1'013 regions in the cortex. Each such connection 
traverses the white matter; the sequence of white mat- 
ter regions on its path defines the mapping M{e^). At 
the physical layer, two nodes are connected by a phys- 
ical edge e"^ if they appear directly connected (i.e., are 
consecutive in the sequence of regions) in at least one 
mapping M(e'^). By this procedure we have obtained a 
two-layer structure, where the logical graph consists of 
the long-range connections in the brain and is mapped 
on the physical layer that reflects the '3D white matter 
structure' used to establish these long-range connections. 

Of course, many real-life systems have mechanisms to 
partially or fully recover from failures. For instance, the 
Internet consists of several (seven layers in the classic 
view) layers that are specified in the ISO/OSI network 
model ,22j. Some of these layers, e.g., the 'network layer' 
with its IP protocol, attempt to find an alternative path 
around a failing link or node. This requires, among oth- 
ers, the physical graph to be connected. The situation 
gets more difficult in railway networks, because for a train 
its entire path is important, not only the end-points. Al- 
though it is sometimes possible to slightly change the 
itinerary of the train or to organize alternative means of 
transportation (e.g., a bus) around the failing section, 
the common practice is to halt all the trains that use it. 
In order to keep our analysis general and to cover the 
whole spectrum of possible situations, in this paper we 
study two extreme policies: no rerouting, and full rerout- 
ing. In the former case we delete immediately all logical 
edges affected by a physical failure. In the latter case, 
we delete any affected logical edge only when there is 
no path in the physical graph G"^ between the end-nodes 
of e"^ (i.e., end-nodes of belong to different compo- 
nents of G"^). Otherwise, the logical edge remains in 
the graph, and its mapping is updated by the shortest 
path in G"^. Consider the example in Fig. ^ Under the 
no rerouting policy, three logical edges are removed after 
the failure of ef. However, as the physical graph G"^ is 
still connected, under the full rerouting policy all these 
three logical links can be rerouted and thus remain in the 
logical graph. 

By studying the two extreme policies, no rerouting and 
full rerouting, we also capture the specific features of our 
three data sets. For instance, in the railway system every 
rail track has a limited capacity that cannot be exceeded. 
Therefore, even if we allow for rerouting, some routes will 
be forbidden due to a possible overload. In the Gnutella 
data set, the AS graph routing depends on the internal 
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FIG. 2: Edge load distribution in three layered systems. The 
main plots are in log-log scale (log- binned) ; the insets present 
the same distributions in log-lin scale (lin-binned). 

policy of involved ASes and peering relationships estab- 
lished between the ASes . This results in routes that 
are not necessarily the shortest possible, and makes some 
of the routes invalid. These additional constraints im- 
posed on the Railway and Gnutella paths naturally limit 
the performance of these systems below the 'full rerout- 
ing' level. Finally, the brain has some ability to reroute 
around broken connections too. However, this process 
takes substantial time. Therefore, an initial response of 
the brain would be better described by the no rerout- 
ing policy, but in time the brain will slowly recover and 
reroute some of the lost connections. This slow recovery 
process can be observed at patients that suffer from e.g., 
a stroke, or have undergone a brain surgery. 

In other words, all responses of real systems to physical 
failures are located somewhere between the no rerouting 
and the full rerouting policy. This is especially impor- 
tant, because, as we show later the difference between 
these two extreme scenarios is often very small. 

Before we simulate the impact of failures on our sys- 
tems directly, let us try to roughly predict what will hap- 
pen by studying related distributions. In a layered sys- 
tem, every physical node or edge can be characterized by 
the load. The load / of the physical node v'^ or edge e* 
is the sum of weights of all the logical edges whose paths 
traverse v'^ (e'^) The load becomes a very impor- 
tant parameter when we allow for failures in the system. 
Clearly, the higher the load of a failing physical compo- 
nent, the more it affects and perturbs the logical layer. If 
the load is distributed evenly in the physical graph, a ran- 
dom failure will not be very different from an intentional 
attack. If, however, the load distribution is very uneven, 
the highly loaded parts become an obvious target for an 
efficient attack. In Fig.|21we present the load distribution 
in the three layered systems we study. In each case the 
distribution is broad (covering 4-5 decades) and heav- 
ily right-skewed. This means that there is a significant 
number of physical links that carry a lot more traffic than 
the other links. Consequently, we can anticipate that an 
attack targeted on the most loaded links will harm the 
system much more efficiently than a random error sce- 
nario. 

We verify this intuition by simulating the error and 
attack scenarios on the three studied systems. The re- 
sults are presented in Fig.|31 Although the exact system 



response varies in all three cases, there are a number of 
features common to all or most of them: 

1) The attacks are much more harmful than errors. For 
example, in Gnutella with no rerouting, half of the log- 
ical mass (total edge weight) is erased after 22% physi- 
cal edges randomly fail, or after only 0.04% most loaded 
edges are attacked. Although under the 'full rerouting' 
policy this difference is smaller, we still need about 60 
times more random failures than attacks to achieve the 
same goal. 

2) When the system is attacked, the logical graph is usu- 
ally affected much faster than the physical graph. For in- 
stance, in Gnutella, an attack (with or without rerouting) 
on 5% of the physical edges hardly affects the physical 
graph - the largest connected physical component cov- 
ers almost the entire original graph. At the same time, 
this seemingly unharmful attack deletes more than 95% 
of logical edges! We obtain similar results when we con- 
sider the size of the largest connected component in the 
logical graph as the measure of robustness. (These re- 
sults are not shown in Fig. |3|for better readability.) 

3) The attack under the full rerouting policy affects the 
physical graph more than under no rerouting. When 
rerouting is allowed the logical edges are deleted only 
when the physical graph gets partitioned. This, in turn, 
effectively reduces the size of the largest connected phys- 
ical component. This phenomenon is especially pro- 
nounced in the last plot in Fig. O (brain, attack toler- 
ance). Under full rerouting, the size of the largest compo- 
nent in the physical graph (filled triangles) drops rapidly 
after about 55% of physical edges are attacked. Clearly, 
this component splits into two components of compara- 
ble size. This behavior can be explained on the example 
in Fig. n Initially, the physical edge ef is used by three 
logical links. It is the most loaded edge in the physi- 
cal graph and hence it is removed as first by our attack. 
Now, under no rerouting policy, three logical edges are 
deleted. In what remains, the load is distributed equally 
on four physical edges, so there is no preferred edge for 
our attack. In particular, in the second round the attack 
may remove the physical edge Cg, keeping the physical 
graph connected. In contrast, under the full rerouting 
policy, after the removal of ef the three affected logical 
links are rerouted. As all of them must treverse the edge 
ef, the load of ef increases to 4 and ef is removed in 
the second round of the attack. This efficiently splits the 
|)Ir|sical graph into two components of three nodes each. 

4) The logical graph is strongly affected by attacks re- 
gardless of the rerouting policy. This is expressed by the 
proximity of the filled and unfilled circles under attack 
in Fig. 121 (especially for Railway and Gnutella) . As any 
real-life failure recovery policy falls between these two ex- 
tremes (no rerouting and full rerouting), we expect this 
feature to be quite general and universal. 

To conclude, the response of a multi-layer system to 
failures is much more complex than what is observed at a 
single layer. In particular, such systems are more vulner- 
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FIG. 3: Error and attack tolerance of three layered systems. At each iteration we remove one physical edge e^^j either at 
random ('error tolerance', top), or by choosing the most loaded one ('attack tolerance', bottom). In both cases we observe the 
size of the largest connected component in the physical graph G"* (triangles) and the total weight of the remaining logical edges 
(circles). Every logical edge whose mapping contains e^^j is deleted either directly ('no rerouting', unfilled symbols), or only 
when there is no path in between the end- nodes of ('full rerouting', filled symbols). 



able than they seem to be from a single layer perspective. 
This is very important, because the multi-layer structure 
is a model that fits well many real-life systems. 

This work is only the first step towards understand- 
ing the behavior of layered systems under stress. There 
are numerous aspects that require further investigation. 
What is the impact of traffic locality, weight and load dis- 



tribution, failure correlation, or topological properties at 
the two layers on the robustness of the system? Do there 
exist attacks even more efficient than the one proposed 
in this paper? Is it possible to significantly improve the 
resilience of a system, e.g., by adding a relatively small 
number of physical or logical edges? We are planning to 
address these issues in our future work. 
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